Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
According to the World Health Organization, healthy communities rely on well-functioning ecosystems. Clean air, fresh water, and nutritious food are inextricably linked to ecosystem health. Changes in biological activity convey important information about ecosystem dynamics, and understanding such changes is crucial for the survival of our species. Scientific edge cyberinfrastructures collect distributed data and process it in situ, often using machine learning algorithms. Most current machine learning algorithms deployed on edge cyberinfrastructures, however, are trained on data that does not accurately represent the real stream of data collected at the edge. In this work we explore the applicability of two new self-supervised learning algorithms for characterizing an insufficiently curated, imbalanced, and unlabeled dataset collected by using a set of nine microphones at different locations at the Morton Arboretum, an internationally recognized tree-focused botanical garden and research center in Lisle, IL. Our implementations showed completely autonomous characterization capabilities, such as the separation of spectrograms by recording location, month, week, and hour of the day. The models also showed the ability to discriminate spectrograms by biological and atmospheric activity, including rain, insects, and bird activity, in a completely unsupervised fashion. We validated our findings using a supervised deep learning approach and with a dataset labeled by experts, confirming competitive performance in several features. Toward explainability of our self-supervised learning approach, we used acoustic indices and false color spectrograms, showing that the topology and orientation of the clouds of points in the output space over a 24-h period are strongly linked to the unfolding of biological activity. Our findings show that self-supervised learning has the potential to learn from and process data collected at the edge, characterizing it with minimal human intervention. We believe that further research is crucial to extending this approach for complete autonomous characterization of raw data collected on distributed sensors at the edge.more » « lessFree, publicly-accessible full text available November 1, 2025
-
Abstract Accurate cloud type identification and coverage analysis are crucial in understanding the Earth’s radiative budget. Traditional computer vision methods rely on low-level visual features of clouds for estimating cloud coverage or sky conditions. Several handcrafted approaches have been proposed; however, scope for improvement still exists. Newer deep neural networks (DNNs) have demonstrated superior performance for cloud segmentation and categorization. These methods, however, need expert engineering intervention in the preprocessing steps—in the traditional methods—or human assistance in assigning cloud or clear sky labels to a pixel for training DNNs. Such human mediation imposes considerable time and labor costs. We present the application of a new self-supervised learning approach to autonomously extract relevant features from sky images captured by ground-based cameras, for the classification and segmentation of clouds. We evaluate a joint embedding architecture that uses self-knowledge distillation plus regularization. We use two datasets to demonstrate the network’s ability to classify and segment sky images—one with ∼ 85,000 images collected from our ground-based camera and another with 400 labeled images from the WSISEG database. We find that this approach can discriminate full-sky images based on cloud coverage, diurnal variation, and cloud base height. Furthermore, it semantically segments the cloud areas without labels. The approach shows competitive performance in all tested tasks,suggesting a new alternative for cloud characterization.more » « less
-
Abstract. Phase correlation (PC) is a well-known method for estimating cloud motion vectors (CMVs) from infrared and visible spectrum images. Commonly, phase shift is computed in the small blocks of the images using the fast Fourier transform. In this study, we investigate the performance and the stability of the blockwise PC method by changing the block size, the frame interval, and combinations of red, green, and blue (RGB) channels from the total sky imager (TSI) at the United States Atmospheric Radiation Measurement user facility's Southern Great Plains site. We find that shorter frame intervals, followed by larger block sizes, are responsible for stable estimates of the CMV, as suggested by the higher autocorrelations. The choice of RGB channels has a limited effect on the quality of CMVs, and the red and the grayscale images are marginally more reliable than the other combinations during rapidly evolving low-level clouds. The stability of CMVs was tested at different image resolutions with an implementation of the optimized algorithm on the Sage cyberinfrastructure test bed. We find that doubling the frame rate outperforms quadrupling the image resolution in achieving CMV stability. The correlations of CMVs with the wind data are significant in the range of 0.38–0.59 with a 95 % confidence interval, despite the uncertainties and limitations of both datasets. A comparison of the PC method with constructed data and the optical flow method suggests that the post-processing of the vector field has a significant effect on the quality of the CMV. The raindrop-contaminated images can be identified by the rotation of the TSI mirror in the motion field. The results of this study are critical to optimizing algorithms for edge-computing sensor systems.more » « less
-
Cloud cover estimation from images taken by sky-facing cameras can be an important input for analyzing current weather conditions and estimating photovoltaic power generation. The constant change in position, shape, and density of clouds, however, makes the development of a robust computational method for cloud cover estimation challenging. Accurately determining the edge of clouds and hence the separation between clouds and clear sky is difficult and often impossible. Toward determining cloud cover for estimating photovoltaic output, we propose using machine learning methods for cloud segmentation. We compare several methods including a classical regression model, deep learning methods, and boosting methods that combine results from the other machine learning models. To train each of the machine learning models with various sky conditions, we supplemented the existing Singapore whole sky imaging segmentation database with hazy and overcast images collected by a camera-equipped Waggle sensor node. We found that the U-Net architecture, one of the deep neural networks we utilized, segmented cloud pixels most accurately. However, the accuracy of segmenting cloud pixels did not guarantee high accuracy of estimating solar irradiance. We confirmed that the cloud cover ratio is directly related to solar irradiance. Additionally, we confirmed that solar irradiance and solar power output are closely related; hence, by predicting solar irradiance, we can estimate solar power output. This study demonstrates that sky-facing cameras with machine learning methods can be used to estimate solar power output. This ground-based approach provides an inexpensive way to understand solar irradiance and estimate production from photovoltaic solar facilities.more » « less
-
Abstract There is a need for long-term observations of cloud and precipitation fall speeds in validating and improving rainfall forecasts from climate models. To this end, the U.S. Department of Energy Atmospheric Radiation Measurement (ARM) user facility Southern Great Plains (SGP) site at Lamont, Oklahoma, hosts five ARM Doppler lidars that can measure cloud and aerosol properties. In particular, the ARM Doppler lidars record Doppler spectra that contain information about the fall speeds of cloud and precipitation particles. However, due to bandwidth and storage constraints, the Doppler spectra are not routinely stored. This calls for the automation of cloud and rain detection in ARM Doppler lidar data so that the spectral data in clouds can be selectively saved and further analyzed. During the ARMing the Edge field experiment, a Waggle node capable of performing machine learning applications in situ was deployed at the ARM SGP site for this purpose. In this paper, we develop and test four algorithms for the Waggle node to automatically classify ARM Doppler lidar data. We demonstrate that supervised learning using a ResNet50-based classifier will classify 97.6% of the clear-air images and 94.7% of cloudy images correctly, outperforming traditional peak detection methods. We also show that a convolutional autoencoder paired withk-means clustering identifies 10 clusters in the ARM Doppler lidar data. Three clusters correspond to mostly clear conditions with scattered high clouds, and seven others correspond to cloudy conditions with varying cloud-base heights.more » « less
An official website of the United States government
